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Introduction 


Regulation  and  control  play  an  important  role  in  many 
systems.  The  system  performance  is  generally  utilized  to 
direct  the  system  behavior  towards  the  anticipated  goal. 

Such  feedback  mechanisms  are  necessary  elements  of  many 
processes.  The  immune  response  of  a  human  body,  the  air 
temperature  in  a  house  or  the  administration  of  drugs  to  a 
patient  by  a  physician,  exhibit  the  same  feedback  process. 

The  theory  of  such  controlled  processes  is  highly  developed 
and  has  been  applied  to  several  areas  in  scientific  research, 
in  engineering,  in  business  systems,  and  in  government 
operations . 

In  this  paper  the  elements  of  a  control  process  are 
discussed  from  the  point  of  view  of  statistical  applications. 
Several  important  applications  are  pointed  out  where  the 
introduction  of  the  feedback  mechanism  and  the  development 
of  an  optimal  control  policy  is  likely  to  improve  the 
ultimate  performance  of  the  process.  Examples  from  patient 
care  in  the  recovery  room,  monitoring  of  air  pollutants 
and  dynamic  economic  models  are  given. 

Usually,  models  of  control  theory  used  in  applications 
assume  that  the  dynamics  of  the  process  are  completely 
known  and  certain  function,  measuring  system  performance. 
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is  to  be  optimized.  Both  deterministic  and  stochastic 
models  are  used  in  practice  and  the  derivation  of  optimal 
control  policies  often  require  well  known  technique  of 
dynamic  programming.  However,  the  case  when  models  involve 
unknown  parameters  in  the  model,  has  not  been  treated  well 
in  the  literature. 

We  provide  the  framework  in  which  the  parameters  of 
the  dynamic  model  can  be  estimated  based  on  the  data 
generated  by  the  process.  Using  certain  well  known 
algorithms,  these  estimates  are  updated  as  the  process 
develops  and  the  optimal  policy  is  then  obtained.  It  is 
clear  that  the  policy  will  heavily  depend  on  the  random 
behavior  of  the  process.  These  complications  cannot  be 
completely  avoided.  Suggestions  are  made  to  use  the 
statistical  properties  of  the  estimates  which  are  involved 
in  the  dynamic  programming  solution  of  the  control  problem. 
Linear  control  process  with  quadratic  cost  criterion  is 
used  for  the  purpose  of  illustration.  Such  solutions  have 
direct  application  to  the  patient  care  problems  which  have 
received  wide  attention  recently. 


1.  Applications 

In  this  section  we  consider  a  few  applications  where 
the  control  theory  could  be  utilized  with  advantage.  The 
problem  of  patient  care  in  the  recovery  room,  and  the 
problem  of  monitoring  of  air  pollutants  are  discussed. 
There  are  many  other  areas  such  as  in  the  study  of  dynamic 
economic  models  where  the  control  mechanism  is  evident.  A 
recent  comprehensive  account  has  been  given  by  Chow  (1975) 
for  the  dynamic  economic  models. 

Patient  Monitoring 

In  monitoring  patients  in  surgery  or  recovery  room, 
elements  of  a  feedback  control  process  are  in  evidence.  A 
common  procedure  in  monitoring  the  well  being  of  a  fetus, 
for  example,  is  to  monitor  the  levels  of  creatinine  in 
amniotic  fluid  and  extriol  excretion  in  maternal  urine. 
Similarly  in  the  management  of  pharmacologic  intervention 
or  in  general  patient  care,  the  nurse-patient-physician 
system  acts  as  a  feedback  control  process, for  example  see 
Siefen  et  al  (1979). 

The  model  of  patient  care  as  a  control  process  and 
the  resulting  dynamic  programming  solution  was  discussed 
by  Rustagi  (1968).  Recent  applications  of  the  on-line 
computers  in  the  administration  of  patient  care  has  been 
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made  in  several  areas,  for  reference,  see  Uaxman  and  Stacy 
(1965),  Hammond,  Kirkendall  and  Calfee,  (1979),  Sheppard, 
Kirklin  and  Kouchoukos  (1974),  Sheppard,  Kouchoukos  ,  Shotts 
et  al  (1975),  Sheppard  and  Kouchoukos  (1976)  and  Pryor  et 
al  (1975). 

We  discuss  one  of  these  studies  in  more  detail  below. 

Sheppard  and  Kouchokos  (1976)  have  provided  several 
situations  where  the  feedback  mechanism  is  practiced  by 
the  help  of  electronic  computers.  For  example  the  regula¬ 
tion  of  arterial  blood  pressure  is  carried  out  automatically 
through  monitoring  blood  pressures  using  certain  closed 
loop  mechanisms.  Sheppard,  Kirklin  and  Kouchoukos  (1974) 
have  recently  demonstrated  by  actually  performing  decision 
making  tasks  by  computers  in  acutely  ill  patients.  Such 
decisions  have  been  programmed  on  a  computer  so  as  to 
monitor  patients  at  the  Alabama  Medical  Center  for  analysis 
and  treatment  of  impaired  cardiac  performance .  Table  I 
provides  the  monitoring  logic  which  is  implemented  auto¬ 
matically.  It  should  be  noticed  that  the  authors  have 
demonstrated  the  fact  that  a  given  set  of  logical  decisions 
can  be  performed  automatically.  These  decisions  are  given 
in  advance  and  no  attempt  has  been  made  to  obtain  the  best 
possible  decision  in  a  given  situation. 

Consider  the  case  where  several  alternative  procedures 
are  known  to  be  practiced  by  the  clinician.  It  would  then 
be  worth  while  to  choose  the  best  possible  decision  under 
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Mean  Left 
Arterial 
Pressure 
(mm  He) 


7  or 
less 


More  than 
18 


Table  I. 

Logic  for  Analysis  and  Treatment  of 
Impaired  Cardiac  Performance 
Early  After  Operation 

(Sheppard,  Kirklin  and  Kouchokos ,  1974) 


Mean 

Arterial 
Pressure 
(mm  He) 


Cardiac  Index 
( 2./min/m2 ) 


Less 
than  2 


Blood 


Blood 


Less  than 
100 


More  than 
100 


Epinephrine 


Epinephrine , 
Dopanine  or 
isoproterenol 


Less  than 
100 


same  as  above 


More  than 
100 


Epinephrine , 
Dopamine ,  or 
isoproterenol 
plus 

trimathaphan 

or 

nitroprusside 


Nitroprusside 


the  assumption  of  a  certain  optimality  criterion. 

It  whould  be  noticed  that  the  feedback  process  in  patient 
care  involves  measurements  of  physiological  and  ocher  variables 
which  behave  in  general  according  to  some  random  phenomena. 

Hence  stochastic  control  theory  models  are  more  appropriate 
to  study  the  patient  care  process. 

Air  Pollution  and  Environmental  Health 

Another  problem  where  optimal  control  theory  can  be  applied 
usefully,  occurs  in  environmental  health.  It  has  been  demonstrated 
that  high  levels  of  air  pollutants  are  injurious  to  health  and 
general  wellbeing  of  living  systems.  Hence  various  forms  of 
governmental  controls  have  been  established  to  regulate 
pollutants  in  the  environment.  Legal  and  punitive  action  are 
taken  against  those  who  are  regarded  as  responsible  for  creating 
this  hazardous  environment.  Large  industrial  corporations  are 
subjected  to  such  control  and  regulation  by  the  U.  S. 

Environmental  Protection  Agency. 

In  the  monitoring  for  the  purposes  of  regulation  of  air 
pollutants,  elements  of  a  stochastic  control  process  are 
evident.  According  to  various  Clean  Air  Acts  of  the  U.  S. 
Government,  the  standards  for  the  various  pollutants,  such 
as  carbon  dioxide,  sulphur  dioxide,  particulate  matter  are 
specified  by  law.  As  soon  as  they  exceed  certain  limits, 
steps  are  taken  to  control  the  various  sources  of  emission 
of  the  pollutants.  The  structure  of  a  feedback  process  in 
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the  regulation  of  air  pollution,  can  be  easily  seen  from 
this  process  of  law  enforcement. 

National  Economic  Models 

In  the  study  of  national  economics,  certain  controls 
are  made  by  the  Federal  Reserve  Board  through  the  manipula¬ 
tion  of  prime  interest  rate  as  well  as  through  other  steps 
which  may  affect  the  money  supply.  The  process  of  control 
of  the  national  economy  requires  the  knowledge  of  the  state 
of  the  economy  in  terms  of  several  important  variables  so 
as  to  allow  taking  appropriate  action. 

Consider  that  the  economy  is  described  by  the  total 
amount  of  consumer  expenditure  and  private  investment 
expenditure.  The  control  can  be  exercised  by  Government 
expenditure  and  the  total  money  supply.  The  optimality 
criterion  in  this  case  can  be  considered  to  be  minimization 
of  the  discrepancy  between  the  growth  rates  of  consumption 
and  private  investment  expenditure  by  certain  targeted 
increase  of  these  expenditures.  This  discrepancy  may  be 
formalized  by  a  quadratic  criterion.  The  model  utilizing 
some  assumed  numbers  has  been  described  by  Chow  (1975). 
Similar  discussion  of  macroeconomic  models  with  random 
parameters  has  been  made  by  Havennerand  Grains  (1973).  For 
multiple  time  series,  a  recent  study  of  the  same  type  is  by 
Bovas  (1980). 
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In  the  next  section  we  describe  the  basic  models  of 
control  theory  and  discuss  the  linear  model  in  some  detail. 
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2. 


Control  Model 


In  most  of  the  engineering  and  other  applications  of 
control  theory,  a  general  assumption  made  is  that  the 
dynamics  of  the  system  are  known.  They  are  generally 
described  by  differential  or  difference  equations.  Both 
deterministic  and  stochastic  models  are  used  and  lead  to 
interesting  problems  depending  on  the  type  of  objective 
functions  used  for  optimization  of  controls.  There  is  an 
extensive  literature  on  control  theory,  some  of  which  is 
mentioned  in  the  references  here,  for  example,  see  Polak 
(1971),  Pshenichnyi  (1971),  Bertsekes  (1976),  Gihman  and 
Skorohod  (1979). 

Thera  are  many  situations  in  applications  such  as  in 
patient  care,  where  the  nature  of  the  performance  of  the 
system  is  not  realistically  described  by  deterministic 
models  and  hence  stochastic  models  must  be  used  to  describe 
such  systems.  In  economics,  for  example,  one  is  confronted 
with  the  problem  of  obtaining  an  optimal  control  policy, 
when  the  economic  system  is  being  affected  by  a  large  number 
of  uncertain  factors.  Chow  (1973)  has  considered  the 
problem  of  finding  an  optimal  policy  in  case  of  economic 
dynamic  systems.  Not  only  the  measurements  in  such  systems 
are  random  but  also  the  form  of  the  system  performance  has 
to  be  approximated  by  some  hypothetical  model.  In 
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econometrics,  generally  one  uses  a  linear  model  in  associating 
the  output  of  the  system  with  the  input  and  the  control 
utilized.  Using  quadratic  cost  criterion,  optimal  policies 
are  derived.  Besides  being  computationally  feasible,  such 
linear  models  describe  the  phenomenon  fairly  well  and  have 
been  fully  treated  in  various  contexts  in  the  literature. 

Box  and  Jenkins  (1968)  have  discussed  statistical  models  for 
control  of  time  series.  We  first  discuss  the  case  of  a 
deterministic  control  model.  The  system  is  generally  described 
by  a  state  vector  x.  of  dimension  n  at  time  t.  Let  u.  be  the 
control  vector  in  p-dimensions .  In  the  case  of  discrete 
time  points,  t  =  0,  1,  2,  T,  the  deterministic  control 

process  can  be  described  by  the  difference  equation  given 
below : 

*t  +  I  "  ®t^t’  *it5  »  t  =  0 ,  1  j  ?  ,  •••,  T-l  (2.1) 

where  g^_  are  known  functions  given  for  the  system.  If  there 
is  feedback  present  in  the  system,  the  present  state  of  the 
system  is  used  to  guide  the  system  back  to  its  normal  operation. 
That  is ,  we  assume  that 

Ut+1  =  h(xt,  zt)  (2.2) 

That  is,  the  discrepency  of  the  state  vector  from  its 
desired  or  target  value  zt,  is  used  to  design  the  control. 

The  control  is  chosen  in  such  a  way  so  as  to  optimize 
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the  overall  performance,  for  known  k 


J<«0)  =  J0  k(«f  &>• 


JCxQ)  is  implicitly  a  function  of  . 

Following  forms  of  the  function  k,  are  commonly  utilized 
in  practice. 


(1) 

k(x. ,  z  )  =  (x+-z  )"(x^-z  ) 

+ 

a  u' 

-u  t 

k(x. ,  z  )  =  j 

a  %t» 

t 

<_  T  - 

(2) 

vt  *  vt 

*Tt 

t 

=  T 

(2.4) 


(2.5) 


where  a  is  some  constant. 

The  basic  problem  of  control  theory  is  to  find  a  set  of 
control  Uq,  u^ ,  uT_-^  so  as  to  optimize  J.  Solutions  to 

this  problem  are  generally  obtained  through  the  technique  of 
dynamic  programming.  There  is  an  extensive  literature  on 
dynamic  programming,  for  a  brief  description  of  the  technique, 
the  reader  is  referred  to  Rustagi  (1976). 

The  dynamic  programming  technique  has  been  extensively 
used  in  various  applications.  In  defense  contexts,  the 
technique  was  suggested  by  Rustagi  and  Doub  (1970)  for  optimum 
distribution  of  Armor.  For  an  extensive  coverage  of  the  art 
and  theory  of  dynamic  programming,  see  Dreyfus  and  Law  (1977). 

In  engineering  literature,  usually  a  continuous  process 
is  considered.  Pontriyagin' s  Maximum  Principle  has  been 
developed  to  give  a  mathematical  foundation  for  such  control 
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processes.  For  a  brief  description  of  Maximum  Principle  and 
its  comparison  with  Dynamic  Programming  the  reader  is 
referred  to  Rustagi  (1976).  Corresponding  minimum  principles 
have  been  advanced  by  several  authors,  a  recent  paper  of 
interest  is  by  Varaiya  and  Walrand  (1980). 

Stochastic  Control  Model 

Suppose  now  the  state  of  a  system  is  given  by  n-dimen- 
sional  random  vector  x.  at  time  t.  Let  u..  denote  the  p-dimen- 

'V.t  'Vt 

sional  vector  of  controls  at  time  t  and  be  vector  of  n- 
dimensional  random  disturbances.  Usually  the  control  system 
can  be  described  by  the  equation 

vt  +  1  =  gt(*t’  Jdt»  £t}»  t  =  1,  2,  .  .  .  ,  T-l  (2.6) 

where  g^  is  a  sequence  of  known  functions. 

The  system  with  feedback  is  given  in  Figure  2.1.  In 
this  case  the  systems  deviation  from  target  value  already 
prescribed  for  the  system  state  x.  is  utilized  for  adjustment 
of  the  system. 

In  engineering  literature  usually  it  is  assumed  that 
cannot  be  observed  directly  and  instead  we  observe  ^  with 
some  random  error  q. .  That  is, 

It  =  h(*t>  «t>  <2*7) 

The  performance  index  or  objective  criterion  function  of  the 
system  now  has  to  be  some  parameter  of  the  distribution  of 
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the  cost  which  is  random.  In  this  case,  we  may  optimize 
E(J(Xq))  since  J(Xq)  is  a  random  variable.  The  problems  of 
existence  and  characterizations  of  optimal  controls  in  the 
case  of  stochastic  systems,  are  discussed  in  detail  by  Aoki 
(1967).  In  the  next  section,  we  discuss  a  linear  control 
process . 


Figure  2.1 


I 
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I .  Linear  System  and  Quadratic  Cost 

In  this  section  we  assume  a  linear  control  process  and 
quadratic  cost  function.  We  first  consider  the  univariate 
case.  Let 

xt+l  '  “t  xt  +  Bt  ut  +  t  =  0,  1,  2,  ...  (3.1) 

where  ,  gt  are  the  parameters  of  the  model,  is  the  state 
of  the  system  at  time  t  and  u^  is  the  control.  are  random 

errors.  Assume  that  it  is  not  xt  but  yt  which  is  observed. 

Let 

yt  =  xt  +  nt  (3.2) 

where  nt  are  random  errors. 

Consider  for  the  example,  the  cost  function  to  be  the 
terminal  control  function 

J  =  xT2  (3.3) 

Let  T-l  _  , 

X  '  (y0  *  yls  •  ‘  •  ’  yT-l> 

To  optimize  E(J),  we  use  in  this  case,  the  Principle  of 
Optimality  developed  by  Bellman  leading  to  dynamic  programming 
technique.  We  optimize  E(xT2|y  x)  at  first. 

■A  /v 

We  assume  here  a  general  form  of  the  error  structure 
and  prior  information  than  usually  assumed.  Let  the  joint 

distribution  of  (a^,  g^  ,  ,  n^)  be  multivariate  normal  with 

^  *  r  "t 

mean  u  and  covariance  )  where 
%  f- 
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(3.4) 


This  expression  can  be  further  simplified  if  we  are  given  the 


*  *  ...  T—1 

conditional  distribution  of  xt  given  y  .  Let 

z  % 

E(xt!yt)  =  vt 

and 

V(xt|yt)  =  AtJ, 


(3.5) 


(3.5) 


We  then  have , 

E<*SIZH"1>  =  °aa"1<AT-l  +  vT-l)  *  ^Ip  ^ 

+  2ut -i^&T1  WW"1'- 
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The  optimal  control  u^,  is  then  obtained  by  minimizing  (3.7) 
with  respect  to  given  by 


T-l 


+  "a1"1  »*ri+vT-i<0«S’1+»'«T'^BT“1> 


(3.8) 

.  & 

Similarly  we  can  follow  backwards  and  obtain 

In  the  case  when  a^_ ,  ,  nt  are  assumed  to  be  mutually 

of  f  , 

independent,  the  covariance  I  is  diagonal  and  the  expression 

'V 

(3.8)  reduces  to  the  one’s  usually  found  in  text-books,  for 
example,  see  Aoki  (1967)  and  De  Groot  (1970). 


Bayes  Control  Policies 

In  stochastic  control  theory  when  the  parameters  of  the 
model  are  not  known,  Bayes  methods  are  commonly  used  to  obtain 
the  optimal  control  policies.  A  general  formulation  of  the 
adaptive  Bayes  control  policies  has  recently  been  given  by 
Suzuki  (1979). 

Consider  the  following  linear  model. 

*t+l  ”  £t  £t  +  £t  +  £t ’  t  =  0,  1,  2,  ...,  T-l 
where  x+  is  p-dimensional  vector,  u*.  is  a  q-dimens ional  vector 

'Vt  'ut 

and  are  random  p-dimensional  random  vectors.  Assume  further 
that , 

yt  =  H  x  +  nt  ,  t  =  0,  1,  2,  . . . ,  T. 

'Vt  'vt  %t 

with  y  being  a  r-dimensional  vector  and  r».  a  random  vector. 

The  matrices  A^ ,  Bt  and  Ht  are  assumed  to  be  known.  A  general 
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assumption  for  Bayes  analysis  made  is  that  the  distributions 


of  and  n+.  are  known  except  some  unknown  parameters  0,  5  09  . 

<\,t 

For  simplicity  of  computations,  assumptions  are  made  that  E 

'u1- 

have  multivariate  normal  distribution  with  unknown  means  ©^ 
but  known  covar: ance  matrix  for  t  =  0,1,  2,  ...  T-l. 

Similarly  n*.  are  assumed  to  be  independent  of  but  distri- 

-V*" 

buted  normally  with  unknown  mean  0O  and  known  covariance  matrix 

a/  L 

^2*  The  Bayes  strategies  are  computed  by  assuming  prior  dis¬ 
tributions  on  0,,  0„  and  x~ .  In  this  case,  it  is  assumed 

/v  i  ~  z  <\,U 

that  their  joint  distribution  is  completely  known  appropriate 
multivariate  normal  distribution.  For  a  the  performance 
function  of  the  quadratic  type. 


T 

J  =  K±  «t  +  St-1  §t-i  Jdt-1  +  £t  It  1 t> 

with  known  matrices  P  ,  Q,  S  R . ,  the  optimal  policy  can  be 
computed  by  dynamic  programming,  first  for  t  =  T  and  then 
moving  backwards.  The  details  are  given  by  Suzuki  (1979). 

In  practical  problems  suggested  earlier,  the  matrices 
of  the  model  are  not  known.  However,  they  can  be  estimated 
from  data  already  available  by  experimenters  on  those  models 
when  the  process  under  control  is  observed  for  a  sufficiently 
long  period  of  time.  This  aspect  of  the  control  problem  is 
discussed  in  the  next  section. 
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5 .  Statistical  Considerations 

Consider  the  case  when  the  control  process  has  been 
observed  for  a  certain  period  of  time.  Assuming  the  form 
of  the  dynamics  of  the  process,  such  as  linearity,  the 
unknown  parameters  of  the  process  equations  can  now  be 
estimated  from  the  data.  These  estimates  can  then  be  used 
for  statistical  control. 

Suppose  the  model  at  time  t  is 

xt  +  l  =  “t  xt  +  ®t  ut  +  et  (S.D 

with  assumptions  on  the  errors  For  illustration, 

consider  the  situation  when  et ’ s  are  independently  and 
identically  distributed  with  means  0  and  variance  a2,  and 
for  estimation  purposes,  the  parameters  at  and  Bt  have 
remained  constant  during  t  =  -T,  -  T+l,  ...,  -1,  0,  say, 
equal  to  aQ  and  Bq  respectively.  In  this  case  the  least 
squares  estimates  are  given  by  the  following  normal  equations, 
where  the  summation  is  from  t  =  -T  to  t  =  0. 


It  is  well  known  that  under  the  additional  assumptions  of 
normality  of  errors  ,  the  above  estimates  are  also  maximum 
likelihood  estimates  of  oiq  and  Bg.  The  general  system  of 


IB 


equations  when  the  model  contains  more  parameters  can  be 
similarly  obtained,  for  example  see  Anderson  (1971,  p.  183). 

In  case  x^  is  a  p-dimensional  vector,  and  is  a 
q-dimensional  vector  of  controls,  we  consider  the  model 


x.,..,  =  A.  x+  +  B.  u+  + 
t  +  1  /^t  t  ^t  t  ~t 


(5.3) 


Here  is  a  p  x  p  matrix  of  unknown  parameters,  Bt  is  a 
p  x  q  matrix  of  unknown  parameters  and  e.  is  a  p-dimensional 

^  L 

vector  with  mean  0  and  covariance  matrix  A.  The  normal 
equations  in  such  a  case  are  given  by 


£*t-l  vt-1  £*t-l  “t 

Mt  %t-l  ht  kt  /  VBn/  Uu.  x;  /  (5.4) 

under  the  same  assumptions  as  in  the  univariate  case  and 
the  estimates  of  A  are  given  by 

-|  /n  /V  r<  /s 

A  =  m  Ux+  -Ax.,  -  B  u .  )  ( x+.  -Ax.,  -  Bu.)' 

1  %t  — i  <v  "vt  n.t  r\,  f\,t-i  <\,  t 

'v 

The  detailed  development  of  the  estimates  and  the  associated 
theory  is  given  by  Anderson  (1971,  p.  203). 

The  control  process  will  utilize  the  estimates  of 

A  A 

aQ  and  Bq  as  the  initial  estimates  and  use  them  to  update 
the  estimates  as  the  data  accumulates  at  each  stage  of  the 
decision  making  process  as  t  goes  from  0  to  T.  Some  simpler 
forms  of  recursive  estimates,  given  by  Albert  and  Gardner 
(1966)  and  Albert  (1972)  are  discussed  next.  They  can  be 
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implemented  in  any  adaptive  estimation  scheme. 


Recursive  Estimates 

Most  of  the  work  in  control  theory  in  engineering 
applications  is  confined  to  the  case  when  the  parameters  of 
the  model  are  known  and  are  not  modified  during  the 
operation  of  the  control  process.  When  continuous  modifi¬ 
cation  of  the  joint  density  of  the  unknown  parameters  can 
be  made  as  new  observations  become  available,  the  calcula¬ 
tions  become  difficult.  In  place  of  using  the  Bayesian 
approach  that  prior  density  of  the  parameters  is  given,  we 
use  henceforth  the  classical  estimate  the  parameters  as 
given  in  C5.2)  or  (5.4).  We  use  the  recursive  estimation 
procedure  for  the  parameters  using  observations  from  the 
process  and  the  control  during  the  operation  of  the  process. 

Let  the  linear  model  considered  in  equation  (5.1)  be 
simplified  by  using  0t  =  (at>  0t)  and  ht  =  (xt,  ut)  so  that 
(5.1)  can  be  written  as 

xt+l  =  fct  ®t  +  et  (5.5) 

Let  0,  is  the  estimate  of  0^  based  on  (t-1)  observations, 
■v  t 

is  the  observed  state  at  time  t.  We  would  like  a  proce¬ 
dure  to  updata  0t  based  on  xt- 

A  A 

Definition:  0,,,  is  a  recursive  estimate  of  0  if  0^ . ,  is 
-  -vj  +  l  ^  -  %  <v3  +  l 

obtained  by  updating  0.  by  the  observation  x. .  That  is , 

'VJ  j 
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(5.6) 


0j,l  =  Kj  •  Xj) 


The  suggested  recursive  estimate  of  the  differential 
type  by  Albert  and  Gardner  (.1967  ,  p.  Ill)  is  the  following. 


with 


and 


£j+l 


A 

0. 

nO 


+ 


a . 
'vO 


*3 


h  = 


®j-i 


(^-i  JJi^i-i 


i+ 


hr 


Si¬ 


ft: 


(5.6) 

(5.7) 

(5.8) 


The  above  recursive  estimates  can  be  written  in  closed  form 

K'"* 

if  we  are  given  the  initial  estimates  Qn  and  Bn.  In  that 

'V/U 

case 


«*  =  («o‘‘  ♦  b  y>'‘-  k  =  1'2'- 


and 


+i  =  Mo’1  ki  +  1  hi  xi>’  5  1  0 


nO'1'!  <v3  n,1-'  ''j-  i=l 


(5.9) 


(5.10) 


The  convergence  and  other  optimal  properties  of  these 
recursive  estimates  are  enumerated  by  Albert  and  Gardner 
in  their  book. 

A  recent  survey  of  recursive  estimation  procedures 
has  been  given  by  Davis  (1977)  utilizing  concepts  of 
innovative  sequences  due  to  Kailath  (1968).  However,  most 
of  this  survey  is  concerned  with  continuous  processes. 

Assuming  now  that  the  process  equations  are  being 
updated  according  to  the  procedure  described  above ,  in 
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equations  (5-6)  -  (.5.10),  we  assume  that  the  model  for 
control  is  now  the  following 


xt  +  l  =  at  xt  +  6t  ut 


(5.11) 


Suppose 


?t_1  =  (x0- 

xl> 

•  •  *  ’  xt-l 

j;1'1  =  (v 

V 

•  •  •  >  ut-l 

!  ,  .  .  •  ,  T . 

Let  the  performance  function  for  this  process  be  the 
following 

T-l 

(5.12) 


J  =  l  (a  x  2  +  bt  ut>2  +  c.  x 
t  =  0 


where  at ,  bt  and  c  are  known  for  t  =  0,  1,  ...,  T-l.  The 
optimization  procedure  is  concerned  with  finding  ,  u-^ ,  .... 
so  as  to  minimize 

E(J|xT_1,  uT_1) 

Let  Vt(xt)  =  minimum  cost  if  the  process  is  in  state  t 

with  state  variable  x. .  Note  that  V.(x+)  and  the  discussion 

t  t  t 

what  follows  is  conditional  on  x^  and  u*.  This  fact  is  not 

%  -v 

expressed  in  the  notation. 

Using  Principle  of  Optimality  of  dynamic  programming, 

A  A 

Vt(xt)  =  min{  at  xt2  +  b  ut2  +  Vt  +  1(at  xt  +  ut^ 
ut 

t  =  1 ,  2  ,  . . .  ,  T-l ,  and 


V^(xT)  =  c  xT< 


(5.13) 
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Starting  backwards,  we  first  consider 

p ( xp ^ p )  “  mm  {dp  p  Xip  t  ^T““l 

UT-1 

a  /*> 

^  c(a,p_^  ^T~l  ^  ^T—l  ux_i  ^  ^ 
Hence  the  optimal  value  of  is  obtained  as  follows. 


u  ^ 
T-l 


c  oc«^-t  _  ^  ^T  — 1 

+  cL ' 


(5.14) 


T-l  '  '"MT-1 

and  the  corresponding  value  of  the  cost  function  is  given  by 


Vrp^  ^  (  Xij,  /  -  ( Uiji  ^  t  cttip  ^ 


c2aT-l  6T-1 


b<p-i+c  ^T— 1 


T  2  ^XT-1 


(5.15) 


or 


V  * 


^l(xT-l}  =  PT-l3^-] 


(5.16) 


p,p_^  is  the  coefficient  of  in  (5.15).  Similarly,  we 

obtain 

^  A  /N 

P't*  —  ^  ^T  — 2  ®T- 2  xT-2 


u,p*2  (x<j>  2^  = 


A  A 


^T-2  +  Pt— 2  ^T-2 


and 


VT*2^xT-2^  =  Pt-2  xt_2 *  (5.17) 

In  general,  one  obtains  the  optimal  controls  following  the 


above  process  given  by 
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* 


(5,18) 


V*(xt)  =  pt  xt2,  (5.19) 

with  „  „ 

_  2  _  2  a  2 

P  =  a  +  pt  +  1  at2  -  ->  ^  . i-  .  (5.20) 

bt  +  pt+l 

t  =  1,  2,  . . . ,  T-l  (5.20) 

and 

PT  =  c.  (5.21) 

Many  other  performance  functions  can  be  similarly  used  to 
obtain  the  optimal  pplicy.  Large  number  of  such  problems  in 
the  deterministic  case  are  discussed  in  a  recent  survey  on 
art  and  theory  of  dynamic  programming  by  Dreyfus  and  Law  (1977). 

The  stochastic  behavior  of  the  optimal  control  policy 
seems  fairly  complicated  in  general.  In  case  of  Urp^  given 
by  (5,13),  we  find  that  it  is  approximately  the  ratio  of 
the  product  and  square  of  random  variables  which  themselves 
have  a  complicated  distribution.  In  practical  situations, 
we  shall  assume  that  the  number  of  observations  on  which  the 
estimates  of  ct^  and  B^  are  based,  is  large.  Therefore,  using 
asymptotic  theory,  we  have  that  a.  -*•  a.  and  B+.  -*■  6.  in  proba- 

T  X  XU 

bility,  allowing  us  to  use  the  optimal  control  policy. 

These  estimates  or  their  generalizations  are  applicable 
to  various  applications  discussed  earlier. 
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